-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new: Deploy and monitor ML models with GPUs on Amazon EKS #1020
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for eks-workshop ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Karpenter should be preinstalled in this lab as it doesn't really add much to the lab. Take a look at the Inference with AWS Inferentia lab (https://www.eksworkshop.com/docs/aiml/inferentia/). This lab comes with Karpenter preinstalled. In the Jupyter Notebook commands section it would be nice to get some explanation on what this code is exactly doing. This would also enable to user to read through the explanation while waiting for the code to be executed. Other than that this lab looks really good! Thank you for creating it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Some comments:
- The prepare-environment block should have a link out to the Terraform like in other modules
- Please explain hardware infrastructure being used. Can we outline the Karpenter nodepools that are created for the user? Explain what g5 instances are and why we need them for the lab. Example: https://eksworkshop.com/docs/aiml/chatbot/nodepool
- The titles are for the AI/ML modules are not distinct enough. How about something like "Training StableDiffusion on NVIDIA GPUs". Deploying and inference is implied when we're doing training.
What this PR does / why we need it:
This is a new lab to deploy and monitor ML model on Amazon EKS
Which issue(s) this PR fixes:
First PR for this new lab
Fixes # NA
Quality checks
My content adheres to the style guidelines
I ran
make test module="<module>"
it was successful (see https://github.com/aws-samples/eks-workshop-v2/blob/main/docs/automated_tests.md)EKS Workshop
AI/ML on EKS
Deploy and Monitor GenAI Model on EKS
✔ Deploy and Monitor GenAI Model on EKS (1342913ms)
✔ Install Karpenter and KubeRay Operator (211251ms)
✔ Install Jupyterhub (444619ms)
✔ Model Training (60899ms)
✔ Model Inference (312600ms)
✔ Monitor GPU Workloads on EKS (4945ms)
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.